AD-A247  941 


WL-TR-91-1146 


STRETCH  AND  HAMMER  NEURAL  NETWORKS 
FOR  N-DIMENSIONAL  DATA  GENERALIZATION 


Peter  G.  Raeth 

Electronic  Warfare  Support  Measures  Research  Group 
Avionics  Directorate  (WL/AAWP-1) 

USAF  Wright  Laboratory 
Wright-Pat  ter  son  AFB,  Ohio  45463-6543 

Steven  C.  Gustafson,  Gordon  R.  Little,  Todd  S.  Puterbaugh 

University  of  Dayton  Research  Institute 

Kettering  Laboratory:  KL-463 

300  College  Park 

Dayton,  Ohio  45469-0140 


15  January  1992 


Final  Report  For  Period  May  1991  -  October  1991 


Approved  for  Public  Release;  Distribution  Js  Unlimited. 


Approved  for  Public  Release;  Distribution  Is  Unlimited. 


Avionics  Directorate 

Wright  Laboratory 

Air  Force  Systems  Command 

Wright-Patterson  Air  Force  Base,  Ohio  45433-6543 


NOTICE 


When  Government  drawings,  specifications,  or  other  data  are  used  for  any  purpose  other 
than  in  connection  with  a  definitely  Government-related  procurement,  the  United  States 
Government  incurs  no  responsibility  nor  any  obligation  whatsoever.  The  fact  that  the 
government  may  have  formulated,  or  in  any  way  supplied  the  said  drawings,  specifications, 
or  other  data,  is  not  to  be  regarded  by  implication  or  otherwise  in  any  manner  construed, 
as  licensing  the  holder  or  any  other  person  or  corporation,  or  as  conveying  any  rights  or 
permission  to  manufacture,  use,  or  sell  any  patented  invention  that  may  in  any  way  be 
related  there  to. 

This  report  is  releasable  to  the  National  Technical  Information  Service  (NTIS).  At  NTIS, 
it  will  be  available  to  the  general  public,  including  foreign  nations. 

This  technical  report  has  been  reviewed  and  is  approved  for  publication. 


PETER  G.  RAETH,  MAJOR,  USAF 
Program  Manager 

ESM  Research  Group  Avionics  Directorate 
Passive  Electronic  Countermeasures  Branch 


PAUL  S.  HADORN,  PhD,  Chief 
Passive  ECM  branch,  EW  Div. 
Avionics  Directorate 


Ul~  ^ 


OHN  SEWARD,  LT  COL,  USAF 
Chief,  Electronic  Warfare  Division 
Avionics  Directorate 
Wright  Laboratory 


If  your  address  has  changed,  if  you  wish  to  be  removed  from  our  mailing  list,  or  if  the 
addressee  is  no  longer  employed  by  your  organization,  please  notify  WL/AAWP,  Wright- 
Patterson  AFB,  OH  45433-  6543  to  help  us  maintain  a  current  mailing  list. 

Copies  of  this  report  should  not  be  returned  unless  return  is  required  by  security 
considerations,  contractual  obligations,  or  notice  on  a  specific  document. 


REPORT  DOCUMENTATION  PAGE 

Form  Approved 

OMU  No.  0704-0188 

Public  reporting  burder  for  thi$  collection  of  information  «s  estimated  to  Average  l  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  Or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden  to  Washington  Headquarters  Services.  Directorate  For  information  Operations  and  Reports.  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0204-0 1 88).  Washington.  DC  2050a 

1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  OATE  3.  REPORT  TYPE  AND  DATES  COVERED 

1992  January  Final  Report,  May  91  to  Oct  91 

4.  TITLE  AND  SUBTITLE 

STRETCH  AND  HAMMER  NEURAL  NETWORKS  FOR  N-DIMENSIONAL 

DATA  GENERALIZATION 

S.  FUNDING  NUMBERS 

6.  AUTHOR(S) 

Peter  G.  Raeth,  Steven  C.  Gustafson,  Gordon  R.  Little, 

Todd  S.  Puterbaugh 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  AOORESS(ES) 

Avionics  Directorate  • 

University  of  Dayton  Research  Institute 

.  ,1  -j  '  .  !■' 

\  \V' 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

WL-TR-91-1146 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  AOORESS(ES) 

Avionics  Directorate,  Wright  Laboratory,  Wright-Patterson 
AFB  OH  45433-6543 

University  of  Dayton  Research  Institute,  300  College 

Park,  Dayton  OH  45469-0140 

10.  SPONSORING  /MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

Approved  for  Public  Release;  Distribution  is  unlimited. 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (Maximum  200  words) 

A  hyper-surface  stretch  and  hammar  neural  network  has  been  developed  that  generalizes 
data  from  processes  that  have  one  output  variable  and  one  or  more  input  variables. 

Thi s  network  achieves  several  desirable  properties  through  a  novel  combination  of 
standard  methods.  The  methods  incorporate  principal  components,  linear  least 
squares,  Gaussian  radial  basis  functions,  and  diagonnally  dominait  matrices.  An 
easily  visualized  physical  model  of  network  function  ensures  that  the  combination 
of  methods  is  appropriate  and  practical.  The  model  has  natural  potential  for 
parallel  implementation  and  for  n-dimensional  classification  and  other  pattern 
recognition  tasks.  These  tasks  include  smoothing  (interpolation),  filtering,  and 
prediction  (extrapolation).  The  model  can  be  extended  to  accommodate  multiple 
outputs.  Unlike  many  other  neural  networks  (such  as  backpropagation-trained  net¬ 
works),  the  training  and  performance  characteristics  of  the  stretch  and  hammer  neu¬ 
ral  network.  The  trials  on  three-dimensional  surface  interpolation  are  also 
presented,  as  are  notes  on  other  potential  applications. 

14.  SUBJECT  TERMS 

15  NUMBER  OF  PAGES 

101 

16.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 
OF  REPORT 

UNCLASSIFIED 

18.  SECURI1Y  CLASSIFICATION 

OF  THIS  PAGE 

UNCLASSIFIED 

19.  SECURITY  CLASSIFICATION 

OF  ABSTRACT 

UNCLASSIFIED 

20.  LIMITATION  OF  ABSTRACT 

UL 

NSN  7540-01-280-5500 


Standard  Form  298  {Rev  2-89) 
Pr^r.nbod  by  ansi  Std  Z39.»8 
2UO-1U2 


The  authors  are  indebted  to  the  following  people: 


Dr.  John  Loomis  and  Dr.  Allan  Lightman  for  discussions  on  computer  aided  drafting  and 
computer  aided  manufacturing. 

Dr.  David  Flannery,  Mr.  William  Phillips,  and  Mr.  Nicholas  Pequignot  for  the  use  of 
computing  hardware. 

Ms.  Wilma  Barnes  and  Ms.  Debbie  Abks  for  administrative  support. 

Major  Robert  Landry  for  facilitating  the  fellowship  during  which  this  research  took  place. 
The  sponsors  of  this  research: 

University  of  Dayton  Rapid  Prototype  Development  Laboratory 
University  of  Dayton  Applied  Physics  Department 
USAF  Wright  Laboratory  Avionics  Directorate 
USAF  Wright  Laboratory  Materials  Directorate 


Accession  For 

f 

NTIS  GRA&I 

is y 

DTIC  TAB 

a 

Unannounced 

Just  if ication _ 

□ 

By _ _ 

Distribution/ 


Availability  Codoa 
Avail  aDd/o ? 

iii  Diat  special 


Table  of  Contents 


Abstract  ii 

Acknowledgements  iii 

1.  Problem  Description  1 

2.  Brief  Overview  of  Neural  Networks  4 

3.  Introduction  to  Stretch  and  Hammer  Neural  Networks  8 

4.  Stretch  for  Data  Preparation  10 

5.  Hammer  for  Training  18 

6.  Execution  for  Testing  29 

7.  Interpretation  as  a  Neural  Network  32 

8.  Application  to  3D  Spaces  39 

9.  Data  for  Network  Training  and  Testing  in  3D  Solid  Modeling  61 

10.  Tests  on  Three  Surfaces  68 

11.  Observations  and  Folklore  80 

12.  Conclusions  Relative  to  3D  Solid  Modeling  82 

13.  Two  Other  Applications  84 

14.  Recommendations  90 

References  93 

Author  Biographies  95 


iv 


2-1  Neural  Network  Basic  Architecture  5 

4-1  Unstretched  Training  Points  15 

4- 2  Stretched  Training  Points  16 

5- 1  Least  Squares  Fit  20 

5-2  The  Four  Gaussians  23 

5-3  z’  Modified  Gaussians  25 

5-4  Summed  z’  Modified  Gaussians  26 

5- 5  z’  Modified  Gaussians  Summed  With  Least  Squares  Line  28 

6- 1  An  Interpolation  Surface  31 

7- 1  SHNN  Architecture.  Description  of  Network  Notation  33 

7-2  SHNN  Architectue.  Overview  34 

7-3  SHNN  Architectue.  Converting  to  Principal  Component  Space 

and  Obtaining  Position  on  Least  Squares  Plane  36 

7- 4  SHNN  Architectue.  Obtaining  Position  on  Gaussian  Surface  37 

8- 1  Unstretched  Coordinates  41 

8-2  Stretched  Coordinates  47 

8-3  Least  Squares  Plane  with  Training  Points  50 

8-4  Least  Squares  Plane,  Sum  of  Gaussian  Basis  Functions, 

and  Training  Points  53 

8-5  Least  Squares  Plane,  Sum  of  Gaussian  Basis  Functions  with 

Coefficients,  and  Training  Points  56 

8-6  Fully  Stretched  and  Hammered  Output  and  Training  Points  60 


v 


List  of  Figures  (cont’d) 

9-1  z  =  f(x,y)  =  sin(x)sin(y)  =  sin(0.5)sin(0.25)  62 

9-2  A  3x5  Regular  Grid  63 

9-3  Dome  Based  on  sin(x)sin(y)  64 

9- 4  A  3x5  Training  and  Test  Grid  66 

10- 1  A  Low  Complexity  Surface  69 

10-2  RMS  Precision  with  Increasing  Grid  Density  70 

10-3  Worst  Precision  with  Increasing  Grid  Density  71 

10-4  A  Medium  Complexity  Surface  73 

10-5  RMS  Precision  with  Increasing  Grid  Density  74 

10-6  Worst  Precision  with  Increasing  Grid  Density  75 

10-7  A  High  Complexit  irface  77 

10-8  RMS  Precision  with  Increasing  Grid  Density  78 

10-9  Worst  Precision  with  Increasing  Grid  Density  79 

13-1  Optical  Correlator  84 


vi 


1.  Problem  Description 


A  critical  problem  in  the  link  between  computer-aided-drafting  and 
computer-aided-manufacturing  (CAD/CAM)  is  the  volume  and  complexity  of  data  that  must 
be  sent  between  CAD  computers  and  rapid  prototyping  machines.  The  research  project 
reported  here  was  designed  to  apply  neural  networks  to  this  problem.  The  basic  question 
was,  "Do  neural  networks  enable  a  decrease  in  the  grid  sampling  density  for  surface 
interpolation  in  solid  modeling  for  CAD/CAM?" 

Rapid  prototype  development  (RPD)  is  a  computer-aided-manufacturing  technique  for 
producing  pre-production  parts  directly  from  the  parts’  CAD  representation.  The  technique 
is  also  used  for  small  batch  runs  and  for  producing  molds.  According  to  Kirshman  et  al., 
"Rapid  prototyping  technology  is  perhaps  the  most  significant  new  concept  in  manufacturing 
since  numerical  control  machine  tools."  According  to  Hull,  there  are  seven  critical  areas  in 
rapid  prototyping: 

1)  Part  size 

2)  Building  speed 

3)  Building  accuracy 

4)  Physical  properties  of  formed  parts 

5)  Ease  of  use 

6)  Reliability 

7)  Process  benefits  and  costs  in  the  overall  manufacturing  framework. 

The  research  reported  here  was  mainly  concerned  with  building  accuracy  but  building  speed, 
ease  of  use,  and  reliability  were  also  considered.  It  had  been  hoped  that  process  benefits 
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and  costs  also  would  be  addressed,  but  an  advantage  over  traditional  interpolation  schemes 
for  this  application  was  not  shown  (see  Conclusions). 

Reliability  was  addressed  by  using  a  neural  network  whose  training  and  performance 
characteristics  are  predictable.  The  stretch  and  hammer  neural  network  (SHNN)  trains  in 
a  number  of  steps  and  uses  an  amount  of  resources  that  are  predictable  before  training 
begins.  The  same  is  true  of  performance  once  training  is  completed.  The  amount  of 
resources  and  the  throughput  time  are  known  before  the  network  is  trained  or  implemented. 

Ease  of  use  was  addressed  in  that  the  SHNN  does  not  require  the  user  to  set  any  internal 
network  parameters.  The  SHNN  is  fully  self-adjusting  to  the  problem  at  hand. 

Not  all  neural  networks  have  these  features  of  reliability  and  ease  of  use. 

Building  speed  is  affected  by  the  amount  and  complexity  of  data  that  must  be  transferred 
to  the  rapid  prototyping  machine  from  the  CAD  computer.  The  SHNN  was  investigated  as 
a  way  to  minimize  the  density  of  the  sampling  grid  needed  to  represent  the  surface  of  a 
given  part.  An  alternative  that  was  considered  was  the  use  of  the  SHNN  to  enable  the  use 
of  larger  surface  facets.  In  both  cases,  it  was  hoped  that  data  oi  less  volume  and 
complexity  would  need  to  be  transferred.  Hull  states  that  past  improvements  in  this  area 
have  come  from  various  data  compression  schemes  and  faster  data  communications.  He 
maintains  that  data  preparation  is  still  the  slowest  portion  of  the  CAD/RPD  process  for 
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complex  parts. 


For  building  accuracy  we  investigated  the  amount  of  data  (and  thus  the  density  of  the 
sampling  grid)  needed  to  achieve  CAD /CAM  precision  (0.001).  The  hope  was  that 
CAD/CAM  precision  could  be  achieved  by  the  SHNN  using  a  less  dense  sampling  grid  than 
that  required  by  traditional  interpolation  schemes. 

Donahue  and  Turner  have  also  noted  the  large  file  sizes  that  have  to  be  transferred  for  a 
given  precision.  They  state  that "...  current  information  transfer  methods  coupled  with  the 
differences  in  CAD  representation  schemes  provide  ample  opportunity  for  improvement  in 
the  CAD  to  rapid  prototyping  process ..."  Heller  notes  that,  "One  of  the  largest  hurdles  to 
cross  at  this  stage  of  rapid  modeling  is  the  data  transfer  nightmare."  He  cites  three  major 
solid  modeling  methods: 

1)  Polygonal:  representation  by  a  collection  of  triangular-shaped  facets 

2)  Constructive  solids:  representation  using  standard  shapes  as  building  blocks 

3)  Surfacing:  representation  by  splines  or  polynomials 

The  effort  reported  here  focused  on  the  surfacing  method  and  used  Gaussian  radial  basis 
functions  instead  of  splines  or  polynomials. 
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2.  Brief  Overview  of  Neural  Networks 


From  an  applications  point  of  view,  the  field  of  neural  networks  investigates  ways  to 
program  massive  arrays  of  parallel  processors  to  perform  useful  tasks.  The  interest  in  this 
field  stems  from  the  difficulty  of  using  traditional  sequential  software  methods  to  meet 
modern  requirements.  Another  impetus  to  the  field  is  that  the  traditional  uni-processor 
architectures  are  increasingly  the  cause  of  bottlenecks  that  prevent  timely  task  completion, 
especially  in  real-time  environments.  Finally,  neural  networks  can  generalize  from  examples 
composed  of  multiple  data  elements. 

Neural  networks  are  massive  arrays  of  simple  processors  that  execute  in  parallel.  These 
processors  are  typically  arranged  in  layers  (see  Figure  2-1).  The  processors  in  one  layer  are 
usually  fully  connected  with  the  processors  in  the  immediate  neighboring  layers.  A 
processor  is  sometimes  connected  to  itself  and  to  other  processors  in  its  own  layer.  The 
connections  between  processors  have  associated  weights  that  modify  data  flowing  through 
the  connections.  Each  processor  executes  (in  parallel  with  the  other  processors  in  its  own 
layer)  a  weighted  summation  or  product  of  its  input  data  elements,  an  intermediate  non¬ 
linear  transfer  function,  and  an  output  function.  The  "program"  of  a  neural  network  is 
contained  in  the  inter-processor  connection  weights.  There  can  be  hundreds  of  thousands 
of  these  connection  weights. 
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Figure  2-1:  Neural  Network  Basic  Architecture 

Processor  Nodes 


5 


Figure  2-1.  Neural  Network  Basic  Architecture 


There  are  many  methods  for  setting  the  connection  weights  (sometimes  called  training  the 
network).  Not  "11  methods  are  logistically  supportable  (See  Raeth.  Logistical  cupportability 
refers  to  modifiability  and  maintainability  in  the  face  of  changing  requirements).  All 
methods  involve  sending  "training"  data  through  the  network.  These  data  represent 
examples  of  the  task  the  network  is  to  accomplish.  As  the  training  data  pass  through  the 
network,  the  weights  are  adjusted  automatically  according  to  one  of  several  training 
methods.  The  network  will  respond  appropriately  to  the  training  data  given  that  the 
network  has  had  a  sufficient  exposure  to  the  training  data.  If  the  training  data  adequately 
represent  the  task  to  be  accomplished,  the  network  also  can  correctly  process  test  data  that 
it  has  not  been  trained  on. 

Neural  networks  are  not  programmed  in  the  traditional  sense.  Rather,  they  adjust 
themselves  to  the  task  at  hand  based  on  examples.  Computers  programmed  via  traditional 
sequential  methods  learn  from  algorithms  composed  of  explicit  task-accomplishment 
instructions.  Neural  networks  are  heuristic  in  nature,  not  algorithmic.  Because  of  this, 
training  a  neural  network  is  not  as  straightforward  as  it  might  first  appear.  There  are  many 
training  methods  and  many  network  architectures.  Depending  on  the  task  at  hand,  a  given 
training  method  and  network  architecture  may  or  may  not  be  appropriate. 

The  training  and  performance  reliability  of  a  neural  network  is  of  primary  concern  for 
logistical  and  mission  support  reasons.  Thus,  it  is  necessary  to  use  a  neural  network 
architecture  and  training  method  that  has  predictable  training  and  performance 
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characteristics.  Such  a  network  is  the  stretch  and  hammer  neural  network  discussed  in  this 


report. 

Klimasauskas  et  al,,  Lippman,  Rogers  et  al.,  and  Rumelhart  &  McClelland  have  all  written 
more  detailed  introductions  to  neural  networks.  Lippman’s  paper  is  easy  to  follow  and  is 
widely  referenced.  The  other  authors  have  produced  full-length  books.  Klimasauskas  and 
Rumelhart  &  McClelland  provide  IBM-PC  disks  with  example  networks.  For  increasing 
length  and  level  of  detail,  start  with  Lippmann  then  go  on  to  Rogers.  Follow  up  with 
Klimasauskas.  Rumelhart  &  McClelland  is  the  most  theoretical  of  the  four. 
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3.  Introduction  to  Stretch  and  Hammer  Neural  Networks 


Stretch  and  hammer  neural  networks  are  members  of  the  more  general  class  of  radial  basis 
function  networks.  The  Probabilistic  Neural  Network  defined  by  Specht  and  further 
described  by  Maloney  is  also  a  member  of  this  class  as  are  the  networks  described  by 
Zahirniak. 

For  three-dimensional  solid  modeling,  stretch  and  hammer  is  best  understood  as  a  surface 
fitting  or  surface  interpolation  neural  network.  In  this  context  the  network  has  two  phases: 
training  and  operation.  In  any  supervised  neural  network,  the  training  phase  uses  example 
inputs  and  expected  outputs  to  adjust  the  weighted  connections.  When  training  is  completed 
whenever  the  example  inputs  are  presented,  the  expected  output  is  produced.  In  solid 
modeling,  selected  (x,y)  coordinates  are  used  as  example  inputs  and  the  height  of  the  surface 
above  some  table  (or  baseline)  is  used  as  the  expected  output.  In  testing  or  operation, 
inputs  that  were  not  used  for  training  are  provided  and  an  output  is  produced.  For  solid 
modeling,  the  neural  network  is  expected  to  deliver  as  output  a  very  accurate  height  for  all 
(x,y)  coordinates  of  the  surface  in  question. 

Briefly  stated,  the  training  of  a  stretch  and  hammer  neural  network  is  described  as  follows. 
(More  details  are  provided  in  the  paper  by  Gustafson  et  al.  and  in  later  sections  of  this 
report.)  Orthogonal  coordinates  with  two  horizontal  input  axes  and  one  vertical  output  axis 
are  established.  The  training  points  can  then  be  plotted  in  the  coordinates  of  the  resulting 
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three-dimensional  space.  These  points  are  stretched  so  that  they  are  evenly  distributed  in 
the  input  (horizontal)  space.  A  malleable  plane  is  positioned  to  minimize  the  sum  of 
squared  vertical  distances  between  the  plane  and  the  training  outputs.  The  malleable  plane 
is  hammered  at  each  training  output  by  directing  the  hammer  along  each  vertical  least- 
squares  line  with  normally  and  radially  distributed  accuracy  using  many  small  strikes.  The 
variances  of  the  resulting  Gaussian  radial  functions  are  set  so  that  the  number  of  strikes  at 
any  training  point  just  exceeds  the  number  of  strikes  at  all  other  training  points.  The 
hammering  is  stopped  when  the  malleable  plane  is  deformed  to  intersect  each  training 
output. 

Testing  is  conducted  by  projecting  the  test  inputs  vertically  from  the  horizontal  plane  to  the 
vertical  surface  generated  during  training.  The  output  is  the  vertical  height  of  the  surface 
from  the  horizontal  plane  at  the  (x,y)  cootdinates  of  a  given  test  input. 

Poggio  and  Girosi  have  also  interpreted  neural  network  learning  in  terms  of  hyper-surface 
construction.  Such  an  interpretation  also  can  be  given  to  Specht’s  development  of  the 
Probabilistic  Neural  Network  (PNN)  although  the  surface  developed  by  the  PNN  places 
radial  Gaussian  functions  that  have  one  of  only  two  different  heights  at  each  training  point. 
Thus,  the  PNN  is  less  general  than  the  stretch  and  hammer  neural  network  and  is  useful  for 
classification  but  not  for  continuous  interpolation.  The  SHNN  can  be  used  for  both  tasks. 
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4.  Stretch  for  Data  Preparation 


Aside  from  selecting  appropriate  inputs,  preparing  the  input  data  for  use  in  training  and 
testing  is  perhaps  the  most  critical  process  in  neural  network  operation.  Two  data 
preparation  methods  are  in  common  use:  normalization  and  transformation  by  principal 
components. 

Normalization  takes  the  vector  of  input  values  and  scales  the  elements  so  that  they  are 
bounded  to  a  range  of  values.  Normalization  also  can  be  used  to  ensure  that  the  sunt  of 
input  elements  is  bounded  to  a  fixed  value  or  to  a  geometric  surface.  The  constrains 
imposed  by  many  types  of  neural  networks  require  normalization  of  some  kind  to  be 
performed  on  the  data  inputs,  network  outputs,  or  on  inter-layer  node  outputs. 
Normalization  is  not  required  by  the  SHNN  and  so  it  is  not  discussed  further  in  this  report. 
Normalization  is  typically  discussed  in  the  literature  relative  to  specific  types  of  neural 
networks. 


Principal  component  analysis  is  a  well-known  statistical  technique  that  is  useful  as  an  input 
data  preparation  step  for  the  SHNN.  According  to  Kruskal,  principal  component  analysis 
allows  reduction  or  elimination  of  indeterminacy.  Translational  indeterminancy  is  reduced 
by  adding  various  constraints,  such  as  constraints  that  force  the  data  element  mean  to  ze  ro. 
Rotational  indeterminacy'  is  reduced  by  rotating  the  input  vectors  to  principal  coordinates. 
Principal  coordinate  axes  form  an  orthogonal  system  in  which  the  input  data  vectors  are 
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uncorrelated.  Dilation  indeterminancy  due  to  relative  scaling  of  the  data  elements  is 
reduced  by  forcing  the  sum  of  the  norms  of  the  data  elements  to  unity. 

Principal  component  analysis  also  can  be  used  to  eliminate  input  data  dements  that  have 
little  variation  relative  to  other  elements.  This  function  is  accomplished  by  choosing  for 
elimination  those  input  data  elements  associated  with  the  smallest  magnitude  eigenvalues 
of  the  data  covariance  matrix.  Thus,  the  dimensionality  of  a  problem  may  be  reduced. 
(This  procedure  is  described  further  by  Hecht-Nielsen  and  Hertz,  et  al.)  In  the  SHNN, 
however,  the  eigenvalues  are  used  to  "stretch”  the  small-variation  elements  in  the  principal 
component  space  so  that  the  maximum  variation  is  achieved.  Thus,  no  information  is  lost 
and  maximum  use  of  all  available  information  is  achieved. 

A  more  complete  treatment  of  principal  component  analysis  is  given  by  Hotelling.  The 
specific  implementation  of  the  principal  component  transformation  used  in  the  SHNN 
transforms  all  input  data  vectors  based  on  an  analysis  of  the  training  inputs.  This 
"stretching"  transformation  finds  linear  combinations  of  the  training  input  elements  that  are 
optimum  in  that  the  transformed-coordinate  covariance  matrix  is  the  identity  matrix. 
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Let  Xjj  be  training  element  j  for  training  vector  i,  where  there  are  n  elements  j  =  1,2,  ....  n 
and  m  vectors  i  =  1,2, ...,  m.  Here,  n  is  strictly  less  than  m  and  the  elements  are  assumed 
to  be  real. 


The  mean  inputs  are 


Xlj+X2j  +  •  ■  • 

m 


The  inputs  relative  to  the  means  are 


Let  X’  be  the  matrix  of  { x’ ^ }  of  inputs  relative  to  the  means.  The  corresponding  covariance 
matrix  A  is  (T  refers  to  the  matrix  transpose) 


A 


X'T  X' 

m-1 
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The  orthogonal  eigenvectors  Vj  and  the  corresponding  real  eigenvalues  A.j  associated  with 
matrix  A  are  the  solutions  of 


Avi  =  xjvj 


where: 


Vj  is  column  j  of  the  matrix  of  eigenvectors  {v^} 
q  =  1,2,  ...,n,  and  the  eigenvalues  are  solutions  c 


|a-A.j|  = 


Let  be  input  j  in  principal  component  coordinates  for  training  vector  i.  Let  tq  be  the 
column  vector  formed  from  row  i  of  the  matrix  of  {uj.  Let  be  the  column  vector  formed 
from  row  i  of  the  matrix  {X’y}.  Then  the  principal  component  transformation  is 


U ,  =  BX'  j 

-C  -L 


where  the  elements  of  the  transformation  matrix  B  are 


Klv^ 


The  numerical  evaluation  of  the  eigenvectors  and  eigenvalues  is  best  accomplished  using 
singular  value  decomposition  of  the  matrix  A.  Note  that  the  transformed  inputs  u(j  are  unit- 
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less  and  that  the  columns  of  the  matrix  {uy}  have  zero  mean,  unit  variance,  and  zero 
covariance.  Singular  value  decomposition  is  a  mathematical  technique  for  dealing  with 
systems  of  equations  that  are  singular  or  numerically  very  close  to  being  singular.  Our 
implementation  follows  Press,  et  al.,  and  is  supplemented  by  Nobel  and  Daniel. 

A  simple  two-dimensional  example  that  employs  the  above  algorithm  is  as  follows.  Choose 
the  original  coordinate  values  in  the  training  space  as  shown  below  and  plotted  in  Figure 
4-1. 


x-coordinate  Expected  SHNN  Output 


5.0 

1.0 

2.0 

2.0 

3.0 

3.0 

1.0 

4.0 

Note  that  the  SHNN  does  not  require  any  particular  order  in  the  training  data.  The  above 
x  coordinates  are  transformed  to  the  coordinates  plotted  in  Figure  4-2. 


x-coordinate  Expected  SHNN  Output 


1.3175  1.0 

-0.4392  2.0 

0.1464  3.0 

-1.0247  4.0 
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Figure  4-1:  UnStretched  Training  Points 


Training  Points 


Figure  4-2:  Stretched  Training  Points 

SHNN  Expected  Output 


Stretched  Training  Points 


Note  that  the  Expected  SHNN  Output  is  not  affected  by  the  coordinate  transformation  and 
that  the  transformed  coordinates  are  in  principal  component  space,  not  the  original  training 
space.  As  desired,  the  average  value  of  the  transformed  element  values  is  0.0  and  the 
covariance  matrix  is  the  identity  matrix,  which  in  this  case  has  a  single  unit  element.  The 
Expected  SHNN  Output  is  defined  by  the  network  user  as  the  value  to  be  produced  by 
SHNN  when  the  given  coordinates  are  applied  as  input. 
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5.  Hammer  for  Training 


There  are  three  phases  to  the  training  of  a  stretch  and  hammer  neural  network.  Phase  1 
fits  a  least  squares  plane  to  the  training  data.  Phase  2  places  Gaussian  radial  basis  functions 
at  each  training  point.  Phase  3  uses  those  Gaussian  functions  to  "hammer"  the  least  squares 
plane  until  it  contacts  the  expected  network  output  at  each  training  point. 

Phase  1  of  SHNN  training  fits  or  positions  for  "hammering"  a  linear  hyper-plane  of  n 
dimensions  to  the  training  outputs  using  a  least  squares  criterion.  Depending  on  specific 
requirements,  the  training  data  is  first  be  prepared  using  the  principal  components 
transformation  (see  Stretch  for  Data  Preparation).  Let  z  be  the  vector  of  training  outputs 
(Zj,  z2, ....  zm)T,  where  T  refers  to  the  vector  transpose.  Let  vector  a,  (a,),  alf  a^  ...,  an)T,  be 
the  coefficients  of  the  linear  hyper-plane,  where  n  is  the  number  of  elements,  a^  is  the  z-axis 
intercept,  and  av  ...,  an  are  multipliers  for  the  corresponding  training  vector  elements. 
Let  C  be  the  matrix  for  which  row  i  is  (1,  uil(  ui2,  ...,  uin),  where  Uj  refers  to  one  of  the  m 
training  vectors.  Then  the  least  squares  solution  of 

z  =  Ca 

fits  the  linear  hyper-plane  to  the  training  outputs.  Note  that  the  numerical  evaluation  of  the 
unknown  elements  in  vector  a  is  best  accomplished  using  singular  value  decomposition  and 
that  the  solution  involves  m  linear  equations  in  n  + 1  unknowns,  where  m  is  strictly  greater 
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than  n.  Continuing  the  example  from  the  section,  "Stretch  for  Data  Preparation,"  Figure  5-1 
shows  the  least  squares  line  that  fits  stretched  training  points.  The  vector,  1,  of  points  on 
the  least  squares  line  are  calculated  using  the  following  equation  after  a  has  been  resolved: 

/  =  Ca 

where  vector  a  =  2.500 

-1.073 

Phase  2  of  SHNN  training  places  Gaussian  radial  basis  functions  at  each  training  point.  The 
variances  for  these  functions  are  adjusted  so  that  the  matrix  of  Gaussian  equation  results 
is  diagonally  dominant  by  columns.  In  traditional  interpolation  methods,  lines,  polynomials, 
or  splines  connect  adjacent  training  points.  The  SHNN  adds  up  a  series  of  Gaussian  curves, 
where  each  curve  is  centered  on  a  single  training  point.  The  variance  of  any  given  curve 
is  such  that  the  number  of  hammering  strikes  at  the  training  point  on  which  the  curve  is 
centered  equals  the  number  of  strikes  at  all  other  training  points. 

Let  F  be  the  matrix  of  Gaussian  functions  which  have  an  output  of  unity  at  their  respective 
training  points  and  an  output  which  decreases  as  the  distance  to  other  training  points 
increases.  It  is  these  functions  which  ultimately,  in  Phase  3,  "hammer"  the  least  squares 
plane  (which  was  fit  to  the  training  points  in  Phase  1)  to  contact  the  training  points. 

Each  column  of  the  F  matrix  represents  a  given  training  point.  Each  element  of  each 
column  also  represents  a  given  training  point.  Thus,  the  F  matrix  is  of  size  m-by-m.  The 
value  stored  at  each  matrix  element  is  the  result  of  the  Gaussian  function  centered  at  the 
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Figure  5-1.  Least  Squares  Fit 


column’s  training  point  as  calculated  at  the  training  point  indicated  by  the  element.  For 
example,  column  #1  indicates  training  point  #1.  The  first  element  of  column  #1  indicates 
the  first  training  point,  the  second  element  of  column  #1  indicates  the  second  training  point, 
and  so  on.  Thus,  column  #1,  element  #2  stores  the  output  for  the  first  training  point’s 
Gaussian  function  calculated  at  the  second  training  point.  Note  that  the  diagonal  elements 
of  matrix  F  are  always  equal  unity  and  that  the  off-diagonal  elements  are  always  less  than 
unity.  The  Gaussian  equation  employed  to  calculate  the  value  of  the  matrix  element  is 
as  follows: 


n 


-£  (“,r“ib)2 

<=t 


where: 


Uj: 


=  the  ith  element  of  the  jth  training  vector 
=  the  Gaussian  variance  associated  with  the  jth  training  point 
=  the  base  of  natural  logarithms 


For  many  choices  of  o2  the  matrix  F  is  singular.  This  results  in  a  final  network  which  does 
not  fully  represent  the  training  data.  A  solution  is  to  choose  the  Gaussian  variances  so  as 
to  ensure  that  the  matrix  F  can  never  be  singular.  The  approach  which  requires  the  fewest 
computations  appears  to  be  selecting  the  variances  so  that  the  matrix  F  is  diagonally 
dominant  by  columns.  Columnar  diagonal  dominance  means  that  the  sum  of  the  absolute 
values  of  all  off-diagonal  elements  in  a  given  matrix  column  is  less  than  the  absolute  value 
of  the  diagonal  element  in  that  column. 
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Physically,  columnar  diagonal  dominance  occurs  when  the  variances  are  sufficiently  narrow 
that  the  Gaussian  function’s  value  at  neighboring  (and,  therefore,  distant)  points  is  small, 
Since  extremely  narrow  variances  would  result  in  a  pin  cushion  interpolating  surface  having 
poor  smoothness,  it  is  reasonable  to  attempt  to  make  the  variances  as  large  as  possible  while 
maintaining  diagonal  dominance.  This  condition  is  achieved  using  a  short  iterative 
procedure.  This  procedure  sets  each  column’s  variance  to  a  fairly  small  value  and  then  sums 
the  off-diagonal  elements  in  that  column  (in  the  case  of  Gaussians,  the  result  is  always 
positive).  If  the  summation  is  not  less  than  unity  within  some  margin  of  error  (say  0.001), 
then  the  variance  is  appropriately  modified  and  the  addition  is  repeated.  This  procedure 
is  performed  for  each  column  in  the  F  matrix.  The  F  matrix  and  the  variances  for  our 
continuing  example  are  shown  below. 

The  F  matrix:  1.0000  0.0019  0.1760  0.0055 

0.2985  1.0000  0.6477  0.7220 
0.5843  0.4988  1.0000  0.2718 
0.1165  0.4988  0.1760  1.0000 

Variances  (a2):  1.2760  0.2965  0.3948  0.5264 

Figure  5-2  shows  the  four  Gaussian  functions  relative  to  the  stretched  training  points  and 
the  least  squares  line. 

Phase  3  of  SHNN  training  uses  the  Gaussian  functions  developed  in  Phase  2  to  "hammer" 
Phase  l’s  least  squares  plane  so  that  it  contacts  the  training  points.  The  first  step  in  Phase 
3  is  to  develop  the  vector  z’,  Each  element  of  z’  is  the  difference  between  the  value  of  the 
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Figure  5-2:  The  Four  Gaussians 


Gaussian  x  Training  Points  Least  Squares  Line 


least  squares  plane  calculated  at  a  training  point  and  the  value  at  that  training  point.  After 
a  has  been  resolved, 


z1  =  z  -  Ca  =  z  -  l 

Coefficients  for  the  F  matrix  elements  are  calculated  so  that  the  sum  of  the  previously 
developed  Gaussian  functions  calculated  at  each  training  point  equals  the  difference 
between  the  least  squares  plane  and  that  training  point.  This  task  is  accomplished  by 
solving  a  system  of  m  equations  in  m  unknowns: 

Zf  -  Fb 

where  b  is  the  vector  of  Gaussian  coefficients,  (blt  b2 . bm)T. 


Since  F  is  guaranteed  to  be  nonsingular,  singular  value  decomposition  need  not  be  used  to 
resolve  b.  We  use  LU  Decomposition  as  implemented  by  Press,  et  al.  Figure  5-3  shows  the 
modified  Gaussians  from  our  continuing  example,  where  z’  and  b  were  determined  as 
follows: 


-0.086 

b  =  -0.484 

-0.971 

-3.632 

0.657 

2.242 

0.400 

1.873 

Remember  that  each  F  matrix  column  refers  to  the  Gaussian  surrounding  a  specific  training 
point.  The  z’  modification  of  those  Gaussians  permits  their  summation  at  the  training  points 
to  equal  the  difference  between  the  least  squares  plane  and  those  points.  Figure  5-4  shows 
the  summation  of  the  z’  modified  Gaussians. 
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Figure  5-3:  z’  Modified  Gaussians 
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Column  4  Gaussian  *  Training  Point  Least  Squares  Line 
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Figure  5-4.  Summed  £  Modified  Gaussians 


The  last  step  in  Phase  3  is  to  complete  the  "hammering"  by  combining  the  sum  of  the  z’ 
modified  Gaussians  with  the  least  squares  plane.  The  resulting  vector,  z”,  contains  values 
on  a  continuous  surface  at  the  training  points’  coordinates.  This  surface  can  be  used  for 
interpolation  after  a  and  b  have  been  resolved.  Figure  5-5  shows  z”  calculated  for  our 
continuing  example.  Note  that  z”  does  indeed  contact  each  training  point. 

z"  =  Fb  *  Ca  =  z'  +  l 


2n 


Figure  5-5:  z’  Modified  Gaussians 
Summed  With  Least  Squares  Line 


Modified  Gaussians  Summed  with  Least  Squares  Line 


6.  Execution  For  Testing 


The  function  that  the  "hammering"  process  develops  to  exactly  fit  the  training  points  is 
continuous.  Thus,  any  point  that  has  the  same  dimensionality  as  the  hyper-space  generated 
by  the  network  may  be  chosen  and  an  output  calculated. 


Typically,  test  and  training  points  are  chosen  in  the  original  problem  space.  If  the  network 
was  trained  in  principal  component  space,  the  coordinates  of  the  test  points  must  be  mapped 
from  the  original  space  to  principal  component  space.  The  following  equation  performs  this 
mapping: 


p  =  B(o  -x) 


where:  p  is  the  test  vector  in  principal  component  space 

o  is  the  test  vector  in  original  space 

B  is  the  transformation  matrix  calculated  as  part  of  the  principal 
components  analysis  during  the  "stretching"  portion  of  training 
x-bar  is  the  vector  of  element  means 

(In  p  and  o,  each  element  contains  a  coordinate  value  for  a  given 
dimension) 


Note  that  if  matrix  B  is  the  identity  matrix  I  and  x-bar  is  the  0  vector,  then  vector  p  is  the 
same  as  vector  o.  These  values  of  B  and  x-bar  are  used  if  the  network  has  been  trained  in 
the  original  problem  space.  Given  this  more  general  understanding  of  B  and  x-bar,  it  is 
clear  that  vector  p  is  simply  the  test  vector  in  original  space  mapped  to  the  appropriate 
space. 
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Once  the  test  vector  is  in  the  appropriate  space,  the  following  equation  generates  the 
position  on  the  hyper-surface  developed  during  the  "hammering"  portion  of  SHNN  training: 


h  =  gb  +  pa  +  a0 


where:  b  is  the  vector  of  Gaussian  coefficients 

a  is  the  vector  of  least  squares  coefficients,  (a^  a2,  an)T 
ao  is  the  least  squares  plane  z-axis  intercept 

g  is  the  row  vector  of  Gaussian  training  functions  calculated  at  the 
test  point 

h  is  the  scalar  value  of  the  "hammered"  surface  at  the  coordinates 
of  the  test  point 

p  is  the  test  vector  mapped  to  the  appropriate  space  and  taken  as 
a  row  vector 


Figure  6-1  shows  this  equation  calculated  for  our  continuing  example.  Here,  60  test  points 
were  chosen  in  the  range  -5.0  through  +5.0.  h  was  then  plotted  at  the  coordinates  indicated 
by  the  test  vector  elements.  Note  that  the  SHNN  extrapolates  asymptotically  to  the  least 
squares  plane  as  the  distance  from  the  edge  of  the  training  domain  increases. 
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3-1:  An  Interpolation  Surface 


Figure  6-1.  An  Interpolation  Surface 


7.  Interpretation  as  a  Neural  Network 


The  B  matrix  and  the  a  and  b  vectors  developed  during  SHNN  training  can  be  viewed  as 
neural  network  weights.  The  test  vector  in  the  original  space  would  be  the  network  input. 
This  section  discusses  the  resulting  SHNN  neural  network  architecture. 

As  indicated  in  Figure  7-1,  the  SHNN  architecture  connects  every  node  in  a  lov  er  level  to 
every  node  in  the  layer  above.  The  nodes  in  a  given  layer  are  not  connected  to  other  nodes 
in  that  layer.  Data  flows  through  the  network  with  the  input  at  the  bottom  of  the  figure  and 
the  output  at  the  top.  Weight  values  are  placed  along  the  inter-node  connection  lines  and 
are  subscripted  to  show  that  the  data  flowing  on  the  line  to  the  input  of  one  node  from  the 
output  of  another  are  modified.  The  two-letter  subscript’s  first  letter  is  the  "to"  node  and 
the  second  letter  is  the  "from"  node.  Each  node  typically  executes  a  weighted  summation 
of  its  inputs  and  feeds  the  result  directly  to  its  output.  In  this  case,  no  transfer  or  output 
functions  are  employed  making  for  a  very  simple  node.  In  nodes  that  occur  less  often  in  the 
architecture,  a  transfer  function  also  is  used. 

The  equations  discussed  in  Section  6,  "Execution  for  Testing,"  lend  themselves  in  a  natural 
way  to  parallel  implementation.  Figure  7-2  shows  an  overview.  There  are  two  values  which 
are  summed  to  obtain  the  SHNN  output,  the  value  on  the  Gaussian  curve,  gb,  and  the  value 
on  the  least  squares  plane,  a0  +  pa,  of  the  test  vector  coordinates,  o,  in  original  space.  A 
weight  of  1  on  each  line  from  the  input  layer  to  the  single-node  output  layer 
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SHNN  Architecture.  Description  of  Network  Notation 


Figure  7-2.  SHNN  Architecture.  Overview 


indicates  that  the  input  layer  serves  only  to  distribute  the  input  lines  to  the  layer  above  and 
that  the  layer  performs  an  input  function  that  is  only  an  unweighted  summation. 

The  z-axis  intercept,  is  calculated  during  training  and  needs  no  further  development. 
Figure  7*3  contains  the  architecture  for  determining  the  rest  of  the  value  on  the  least 
squares  plane.  This  architecture  also  maps  the  test  vector  from  original  space  to  principal 
component  space.  The  test  vector  is  input  at  the  bottom  of  the  figure.  The  hidden  layer 
produces  p,  the  test  vector  in  principal  component  space,  lire  output  layer  result  is  pa. 
The  weights  on  the  lines  to  the  hidden  layer  from  the  input  layer  are  the  element  values  of 
the  transformation  matrix  B.  Remember,  B  is  an  n-x-n  matrix,  where  n  is  the  number  of 
elements  in  the  test  and  training  vectors.  Bjj  is  row  j,  element  i,  of  matrix  B.  This 
corresponds  to  the  weight  on  the  line  to  hidden  .'.ode  j  from  input  node  i.  The  weights  aj 
on  the  lines  going  to  the  output  layer  from  the  hidden  layer  are  the  least  squares  coefficients 
assigned  to  the  corresponding  elements  of  p. 

Figure  7-4  gives  the  architecture  for  determining  the  position  of  p  on  the  Gaussian  surface, 
gb.  Vector  p  is  input  at  the  bottom  of  the  figure.  The  input  layer,  in  this  case,  simply 
serves  to  distribute  the  input  to  the  hidden  layer  nodes.  Thus,  the  lines  to  the  hidden  layer 
from  the  input  layer  are  weighted  at  1.  The  hidden  layer  has  one  node  for  each  of  the 
training  examples.  The  memory  for  each  hidden  layer  node  holds  a  unique  training  vector, 
uk,  and  its  associated  variance  multiplied  by  -2  (*2ak2,  -2  x  Sigma\  in  the  figures).  Each 
hidden  layer  node  executes  an  input  unweighted 
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Figure  7-3:  SHNN  Architecture 
Converting  to  Principal  Component  Space 
d  Obtaining  Position  on  Least  Squares  Plane 
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.  SHNN  Architecture.  Converting  to  Principal  Component  Space  and  Obtaining 
Position  on  Least  Squares  Plan 


Figure  7-4:  SHNN  Architecture 
Obtaining  Position  on  Gaussian  Surface 


Figure  7-4.  SHNN  Architecture.  Obtaining  Position  on  Gaussian  Surface 


summation  whose  result  is  used  to  drive  a  Gaussian  transfer  function.  The  hidden  layer  is 


connected  to  the  output  layer.  The  weights  between  these  layers  are  the  Gaussian 
coefficients,  b,  related  to  a  specific  training  vector.  The  output  layer  executes  a  weighted 
summation  of  the  hidden  layer  output  to  obtain  gb. 


* 
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8.  Application  to  3D  Spaces 


This  section  presents  an  example  of  the  SHNN  applied  to  a  small  three-dimensional 
problem.  Figures  are  presented  which  show  input  vectors  moving  through  training  and 
testing.  Note  that  in  the  case  of  3D  spaces  two  dimensions  are  used  for  independent 
variables  and  the  third  dimension  is  used  for  the  dependent  variable.  Accordingly,  n  (the 
number  of  elements  or  coordinates  in  the  training  and  test  vectors)  is  equal  to  2.  A 
subscript  convention  employed  refers  to  matrix  elements  where  the  first  subscript  refers 
to  the  matrix  row  and  the  second  subscript  refers  to  the  matrix  column. 

Stretching 


Consider  a  matrix  of  training  vectors  in  original  space.  The  rows  are  the  m  vectors  and  the 
columns  are  the  n  vector  elements  or  coordinates. 


i  Y  Y  Y  N 

AU>  A12>  •**»  *ln 

%2V  ^22’  n 


X  = 


[Xmi,  Xm2>  x 


mn) 
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For  our  example,  this  matrix  reduces  as  follows.  The  vector  z  contains  the  expected  SHNN 
output  for  each  training  vector.  A  plot  illustrating  this  set  of  training  vectors  is  given  in 
Figure  8-1. 


'2.3,  1.1' 

1.0' 

3.0,  3.0 

2.0 

X  = 

3.0,  4.0 

z  = 

3.0 

^5.5,  6.2, 

,4.0, 

The  vector  of  column  means  is: 

\Xn  +  X2l  +  ...  +  XmI)/m' 

(Xi2  +  ^22  +  **•  +  n2^lm 


[Win  +  *2„  +  +  XJIm) 
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Figure  8-1:  Unstretched  Coordinates 

(The  z-axis  is  perpendicular  to  the  page) 


4 


Figure  8-1.  Unstretched  Coordinates 


For  this  example,  the  following  means  were  calculated. 


X  = 


'3.450' 

J3.575, 


The  matrix  of  input  elements  relative  to  their  means  is: 


x!  = 


*11  XV  *12  X2>  *ln  •*/») 

*21  ”  XV  *22  ”  *2»  •**»  *2«_ 


V*w;  XV  Xm2  X2f  */wi  * 


This  equation  produces  the  following  matrix  for  our  example. 


X 


/ 


-1.150,  -2.475' 
-0.450,  -0.575 
-0.450,  0.425 
,  2.050,  2.625  , 
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The  corresponding  covariance  matrix  is:  (See  Lipschutz  for  a  basic  discussion  on  matrix 
operations.) 


A  = 


yt  yt  yt  ' 

A  11’  A21’  ***’  A  ml 

Y>  y  Y> 

A  12’  A  22’  A  nil 


Y'  T  Y' 

/v  1 7j>  A  2 n’  *“’  ,/1‘ 


/  Y'  Y'  Y'  \ 

A  11’  A  12’  '*•’  A  In 

Y'  Y'  Y' 

A  21*  A  22*  ****  A  2n 


mn/ 


Y'  Y'  Y> 

VAmi»  A  m2’  Ai 


mn/ 


m-  1 


From  this,  it  is  plain  to  see  that  the  elements  of  matrix  A  are  computable  without  actually 
creating  the  transpose  of  X.  Rather,  the  following  equation  may  be  used: 

n 


E  x), 

:=  1 _ 

m-  1 


For  this  example,  matrix  A  becomes: 


' 1.977,  2.765' 
,2.765,  4.509, 
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Find  the  matrix  of  eigenvectors,  V,  and  the  vector  of  eigenvalues,  A..  A  discussion  of  the 
required  process  is  beyond  the  scope  of  this  report.  Press  et  al.  give  an  excellent  discussion 
on  practical  means  of  finding  the  following  eigenvectors  and  eigenvalues.  (Note:  The 
eigenvectors  are  the  V  matrix  columns.) 

(  0.842,  0.540) 

V  = 

(-0.540,  0.842/ 

A.  =  (  0.202,  6.284) 


Find  the  u/s,  the  m  training  vectors  mapped  from  original  space  to  principal  component 
space. 


/d  d  d  \ 

°1V  nl2>  -*»  D\n 

f*V| 

"2J>  ^22’  •*,J^2 n 

• 

C* 

^  * 

• 

• 

• 

* 

Y' 

in) 
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The  matrix  elements  Bqj  are  calculated  as  individual  elements  of  an  n-x-n  matrix. 


b  = 


»  •••» 


I  \y.\fl  K\fn’ 


where: 
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For  our  example,  the  transformation  matrix  B  becomes: 

( 1.874,  -1.203^1 

B  = 

\0.216,  0.336 

As  a  result  of  these  "stretching"  calculations,  the  original  set  of  training  vectors  is  mapped 
as  shown  below  and  plotted  in  Figure  8-2. 


'  0.822,  -1.079' 

1.0' 

-0.152,  -0.290 

2.0 

u  = 

-1.354,  0.046 

Z  = 

3.0 

,  0.684,  1.323, 

,4.0, 

where  the  rows  of  matrix  U  =  u;T. 
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Figure  8-2:  Stretched  Coordinates 


Figure  8-2.  Stretched  Coordinates 


Hammering 


Take  a  matrix  of  training  vectors  in  original  or  principal  component  space  along  with  the 
corresponding  vector  of  expected  SHNN  outputs. 


'<4> 

u  u  ) 

(z  \ 
41 

u  = 

4> 

u  u 

<-/22>  •••>  kj2k 

♦ 

« 

Z  = 

*2 

• 

• 

,unI, 

* 

u  u 

*~m2>  umn/ 

• 

For  our  example,  we  will  continue  with  the  previous  training  vectors  in  principal  component 
space  and  the  same  expected  SHNN  outputs. 

Fit  a  least  squares  plane  to  the  training  data.  This  involves  solving  an  overspecified, 

(m  >  n),  system  of  equations. 
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V 

( \  u  u  u  \ 

V 

Z2 

• 

Ij  ^21*  ^22*  ^2 n 

• 

ai 

« 

• 

•» 

zm 
V  m/ 

• 

• 

V ml9  Ufn2’  *"»  ^ mn , 

• 

a 

\  »/ 

Our  example  hac  this  system: 


1.0' 

1.0,  0.822,  -1.079  ' 

(  \ 

ao 

2.0 

1.0,  -0.152,  -0.290 

al 

3.0 

1.0,  -1.354,  0.046 

,  1 .0,  0.684,  1.323, 

a~ 

\  2) 

The  solution  to  this  system  is  plotted  in  Figure  8-3  and  has  the  following  a  vector: 


a 


'  2.500 ' 
-0.270 
,  1.257, 


Center  Gaussian  functions,  G(ui,uj),  at  each  of  the  training  points.  These  functions  are 
placed  in  matrix  F.  The  columns  represent  training  points  and  the  rows  represent  G(iij,up 
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e  8-3:  Leas 
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Figure  8-3.  Least  Squares  Plane  with  Training  Points 


at  the  column’s  training  point,  at,  calculated  at  the  coordinates  of  the  row  training  point.  Uj. 


(r(WpWj),  Gillyi Wj),  G(um,u j) 


[G(uvum),  G(u2,um), G(wm,Oj 


where:  u;  refers  to  the  training  point  i 

GCu^uJ  is  the  Gaussian  function  for  training  vector  j  calculated 
at  training  vector  k 


G(uJyuk) 


n 


-£  (“yrM*)2 

1*1 


where: 


u^  =  training  element  i  of  training  vector  j 
o  j  =  the  Gaussian  variance  associated  with  the  jth  training  point 
e  =  the  base  of  natural  logarithms 
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Adjust  the  a’s  until  F  achieves  columnar  diagonal  dominance.  For  our  example,  the 
following  F  matrix  was  produced.  The  summed  Gaussian  curve  is  compared  to  the  least 
squares  plane  in  Figure  8-4. 


1.000,  0.418,  0.176,  0.265s 
0.635,  1.000,  0.637,  0.469 
0.177,  0.421,  1.000,  0.265 
,0.188,  0.160,  0.187,  1.000, 


2 o2  =  (3.459,  1.801,  3.451,  4.359) 


Force  the  peaks  of  the  Gaussian  functions  to  equal  the  difference  between  the  least  squares 
plane  and  the  training  points.  This  is  accomplished  by  solving  another  system  of  equations. 
First,  develop  z7,  the  vector  of  differences  between  the  training  points  and  the  least  squares 
plane. 


Zl~  a0  a\u\\  a2u\2  anuin 

Z2~  a\^2\~  a2U22~  anf*2n 


Zm  a0  aiUml 


U2Um2  anUmn ) 
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Figure  84:  Least  Squares  Plane, 

Sum  of  Gaussian  Basis  Functions,  and  Training  Points 


Figure  84.  Least  Squares  Plane,  Sum  of  Gaussian  Basis  Functions,  and  Training  Points 


The  example  produced  the  following  z*  vector: 


'  0.077' 
-0.176 
0.078 
,  0.022, 


Use  z’  to  solve  for  a  set  of  coefficients  which  will  adjust  the  Gaussian  peaks  so  that  they 
represent  the  difference  between  the  expected  SHNN  output  at  the  training  points  and  the 
value  of  the  least  squares  plane  at  those  points. 


v; 

/  r  r  I?  N 

Ml*  r  12’  •**»  rlm 

'V 

* 

. 

^21*  ^22*  ^2 m 

• 

*2 

• 

• 

m 

V  m) 

• 

• 

F  F  F 

y1  ml »  1  m2 »  *•*»  1  mmj 

• 

• 

fim, 
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Our  example  resolved  the  b  vector  as 


'  0.234' 

-0.480 
b  = 

0.235 
,  0.015, 

The  sum  of  the  adjusted  Gaussian  curves  are  compared  to  the  least  squares  plane  in  Figure 
8-5. 
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Execution 


Given  the  previous  calculations,  it  is  now  possible  to  use  the  SHNN  to  calculate  outputs 
based  on  test  vectors  not  used  in  training.  If  the  network  was  trained  with  the  training 
vectors  in  principal  component  space,  the  test  vectors  must  also  be  mapped  to  that  space. 


Pi 

*12, 

D  \ 

—>  D\n 

V 

( 7  Y| 

■*1 

P2 

• 

^21»  ^22» 

• 

B2n 

°2 

• 

X2 

* 

• 

• 

p«, 

• 

• 

J*nP  Bn2’ 

• 

• 

°n 

\  n) 

• 

* 

x„ 

\  n  n 

A  single  mapping  for  this  example  is  shown  below,  assuming  that  the  test  vector 
o  =  (2.300, 1.100)T. 


'  0.822" 

1.874,  -1.203" 

'2.300" 

'3.450V 

-1.079, 

,0.216,  0.336, 

,1.100, 

,3.575] 
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The  appropriately  mapped  test  vectors  are  used  to  calculate  the  SHNN’s  output. 


SHNN  Output  = 


( Gfpjj ,  G(p,t2)9  G(p,?m)) 


+ 
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Choosing  p,  the  mapped  test  vector  as  calculated  above,  the  following  SHNN  result  is 
calculated: 


1  =  (1.000,  0.208,  0.003,  0.003) 


/  0.234^ 
-0.480 
0.235 
0.011 


(1,  0.822,  -1.079) 


1  2.500x 
-0.270 


1.257 


Figure  8-6  shows  the  surface  generated  if  a  regular  matrix  of  test  vectors  is  chosen 
surrounding  the  test  points  (coordinate  axes  bounded  by  -2  through  2  inclusive). 
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Illll 


Fully  Streched  and  Hammered  Output  and  Training  Points 


9.  Data  for  Network  Training  and  Testing  in  3D  Solid  Modeling 

The  SHNN  network  was  trained  and  tested  for  3D  interpolation  using  data  generated  from 
an  equation  z  =  f(x,y)  where  z  is  height  above  a  table  or  baseline  and  x  and  y  are 
independent  coordinate  axis  values.  For  example,  let  f(x,y)  =  sin(x)sin(y),  x  =  0.5,  and  y 
=  0.25,  then  z  =  f(x,y)  =  sin(x)sin(y)  =  sin(0.5)sin(0.25)  =  0.1186118.  This  point  is  plotted 
in  Figure  9-1. 

Training  data  were  generated  from  a  regular  sampling  grid.  First,  the  limits  of  x  and  y  were 
defined  and  a  grid  density  was  specified  in  terms  of  the  number  of  evenly  spaced  y 
coordinates  required  for  each  of  a  number  of  evenly  spaced  x  coordinates.  For  example, 
if  the  limits  of  both  x  and  y  are  each  0.0  through  1.0  and  a  3-by-5  grid  is  selected,  then  there 
are  15  total  data  samples  as  shown  in  Figure  9-2.  Figure  9-3  extends  this  example  by 
showing  sin(x)sin(y)  for  the  original  limits  but  using  a  20-by-20  regular  sampling  grid.  The 
axes  are  tilted  to  display  the  surface  shape. 
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Figure  9-1:  z=  f(x,y)=  sin(0.5)sin(0.25) 
The  z  axis  is  perpendicular  to  the  page 


f(x,y)  =  sin(0.5)sin(0.2) 


All  (x,y)  training  coordinates  for  a  regular  i-by-j  grid  were  generated  using  the  following 
equations: 


min(x)+  ~  D  (rnax(x)  -  min(x)) 

max ( i )  -  1 


y.  =  min  (y)  +  (J  ~  i)  (ma x(y)  -  min(y)  ) 

3  max ( j )  -  1 

where  i  =  1,2, ....  #  x  coordinates  and  j  =  1,2,  ...,  #  y  coordinates  for  each  x  coordinate. 

Test  data  were  produced  by  positioning  these  data  relative  to  the  training  data.  The  goal 
in  test  data  production  was  to  cause  the  maximum  loss  of  interpolation  precision.  This  goal 
was  very  easy  to  attain.  It  was  only  necessary  to  put  the  test  points  at  the  maximum  distance 
from  the  training  points  while,  like  the  training  points,  covering  the  entire  surface.  For  grid 
sampling,  the  test  points  were  placed  in  the  center  of  the  grid  blocks  formed  by  the  training 
points.  Figure  9-4  repeats  Figure  9-2  with  the  test  points  entered.  The  test  points  lie  at 
coordinates  (xa,  ya). 
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Figure  9-4:  A  3x5  Training  and  Test  Grid 
x  and  y  training  limits  =  0  through  1 
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Figure  9-4.  A  3x5  Training  and  Test  Grid 


In  general,  xa  =  x  +  0.5(Xj  -  x)  and  ya  =  y  +  O.Styj  -  y);  where  x,  y,  xlf  and  yj  are  the  four 
corners  of  a  given  grid  as  noted  in  Figure  94, 

For  regular  grids  xa+i  =  x  +  0.5(x1  -  x)  +  i(xt  -  x)  and  ya+i  =  y  +  0.5^  -  y)  +  j(yx  -  y); 
where  i  =  0,1,  ...,(t-l)  and  j  =  0,1,  ...,(u-l).  If  the  training  grid  is  t-by-u  (that  is:  u  y- 
coordinates  for  each  of  t  x-coordinates)  then  the  testing  grid  is  (t-l)-by-(u-l),  where  both  t 
and  u  must  be  equal  to  2  or  greater. 


When  the  testing  data  were  applied  to  the  trained  network,  a  measure  of  interpolation 
precision  was  calculated.  This  measure  essentially  reveals  the  overall  difference  between 
the  answer  the  trained  network  result  and  the  actual  function  result.  The  same  training  and 
testing  data  and  the  same  measure  of  precision  were  used  for  both  the  SHNN  and 
traditional  interpolation.  The  equations  used  for  precision  calculations  are  given  below. 

Precision  - 

CalculatedResult  -  IntezpolatedResu.lt 
CalculatedResul tsRange 


RMS  Precision 


N 


V  Precision 2 
_ 

#TestVectors 


Average  Precision  = 


| Precision 
#TestVectors 
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10.  Tests  on  Three  Surfaces 


The  SHNN  was  tested  on  surfaces  of  low,  medium,  and  high  complexity.  The  results  were 
compared  to  bicubic  interpretation.  Training  grids  of  3,  4,  5,  6,  7,  8,  9,  15,  20,  25,  and  31 
were  chosen.  The  available  computing  hardware  did  not  permit  higher  densities.  The 
training  and  test  points  were  chosen  as  described  in  "Data  For  Network  Training  and 
Testing  in  3D  Solid  Modeling." 

Low  Complexity  Surface 


The  surface  of  low  complexity  is  described  by  the  following  equation: 


/ 

Surface  Height  =  cos 

V 


A  plot  of  this  equation  is  shown  in  Figure  10-1  for  a  50x50  grid  with  the  x  and  y  axes  in  the 
range  of  0  through  1  inclusive.  This  equation  was  chosen  because  it  is  not  symmetric  on 
either  coordinate  axis.  Thus,  there  are  no  duplicated  test  points. 

Figure  10-2  compares  the  RMS  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
Figure  10-3  compares  the  worst  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
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Figure  10-1.  A  Low  Complexity  Surface 


Figure  10-2:  RMS  Precision 

With  Increasing  Grid  Density 
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Figure  10-2.  RMS  Precision  with  Increasing  Grid  Density 


Figure  10-3.  Worst  Precision  with  Increasing  Grid  Density 


Medium  Complexity  Surface 


The  surface  of  medium  complexity  used  the  same  basic  equation  as  the  low  complexity 
surface-  However,  a  Gaussian  bump  was  added  on  each  of  the  four  corners.  The  Gaussian 
bumps  were  generated  using  the  following  equation: 

Surface  Height  =  Low  Complexity  Height  + 

4  -((grf-  x)2+  (8yj-  y)2) 

0.025 e  0005 

i=l 

where:  gri  and  g^  are  the  x,y  coordinates  of  the  Gaussian  centers 

x  and  y  are  the  x,y  coordinates  of  a  training  point 

A  plot  of  this  surface  is  shown  in  Figure  104  for  a  50x50  grid.  The  same  data  ranges  and 
grids  as  before  were  used. 

Figure  10-5  compares  the  RMS  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
Figure  10-6  compares  the  worst  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
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Figure  10-4.  A  Medium  Complexity  Surface 


Figure  10-5:  RMS  Precision 

With  Increasing  Grid  Density 
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Figure  10-5.  RMS  Precision  with  Increasing  Grid  Density 


Figure  10-6:  Worst  Precision 

With  Increasing  Grid  Density 
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High  Complexity  Surface 


The  surface  of  high  complexity  used  the  same  basic  equation  as  the  low  complexity  surface. 
However,  Gaussian  bumps  were  added  on  an  8x8  grid  which  spanned  the  entire  surface. 
The  Gaussian  bumps  were  generated  using  the  following  equation: 

Surface  Height  -  Low  Complexity  Heighc  + 

64  *>2+  (Syr  v>2> 

10.0  e  0  002 

i=i 

where:  gxj  and  g^  are  the  x,y  coordinates  of  the  Gaussian  centers 

x  and  y  are  the  x,y  coordinates  of  a  training  point 

A  plot  of  this  surface  is  shown  in  Figure  10-7  for  a  50xS0  grid.  The  same  data  ranges  and 
grids  as  before  were  used. 

Figure  10-8  compares  the  RMS  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
Figure  10  9  compares  on.  worst  precision  of  the  SHNN  to  that  of  bicubic  interpolation. 
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Figure  10-7.  A  High  Complexity  Surface 


Figure  10-8:  RMS  Precision 

With  Increasing  Grid  Density 


(With  64  Gaussian  Functions  Overlayed) 


Figure  10-9:  Worst  Precision 
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11.  Observations  and  Folklore 

A  key  to  ‘he  accurate  functioning  of  neural  networks  based  on  Gaussian  radial  basis 
functions  is  appropriately  setting  the  Gaussian  variances.  Specht’s  PNN  has  a  single 
variance  that  is  set  by  the  user.  The  SHNN  calculates  a  variance  that  is,  generally,  different 
for  each  training  point.  Thus,  although  the  user  could  set  desired  widths,  the  SHNN  can 
calculate  optimal  widths  for  best  performance. 

Only  one  pass  through  the  training  data  is  needed  by  the  SHNN.  This  pass  includes  four 
places  where  iterations  occur: 

a)  F  nding  the  eigenvectors  and  eigenvalues 

b)  Solving  the  equations  to  fit  the  least  squares  plane  to  the  training  data 

c)  Adjusting  the  ^aussian  variances  to  achieve  columnar  diagonal  dominance 

d)  Solving  the  equations  to  determine  the  Gaussian  coefficients 

The  longest  iteration  occurs  when  solving  the  equations  that  determine  the  Gaussian 
coefficients. 

Solutions  to  systems  of  linear  equations  are  generally  of  0(N3/c)  computational  complexity, 
where  N  is  the  number  of  equations  and  unknowns  in  a  completely  specified  system. 
Singular  Value  Decomposition  (SVD)  is  0(N3/3)  and  Lower-triangular:Upper~triangular 
Decomposition  (LUD)  is  0(N3/15),  according  to  our  experience  with  those  two  methods. 

When  calculating  columnar  diagonal  dominance,  it  is  possible  to  use  off-diagonal  sums  of 
less  than  1.  This  possibility  was  investigated  briefly.  Our  observation  from  a  few 
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experiments  is  that,  while  RMS  precision  is  not  greatly  affected,  there  are  opportunities  to 
improve  worst  precision.  The  implication  is  that  there  may  be  ways  to  fine-tune  the  SHNN 
to  minimize  gross  errors.  Remember,  all  neural  networks  are  heuristic  by  nature.  They  do 
not  compute  precise  answers  according  to  some  algorithm.  On  the  few  surfaces  we  tried, 
an  off-diagonal  sum  of  0.4  (rather  than  unity)  worked  best  if  there  was  any  effect  at  all. 


When  using  SVD  for  solving  systems  of  equations,  we  have  found  that  solutions  to  systems 
of  equations  are  more  readily  achieved  if  the  training  vectors  are  first  passed  through  a 
principal  components  transformation.  In  addition,  we  have  found  that  the  principal 
components  transformation  does  not  affect  interpolation  accuracy  when  regular  grid 
sampling  is  used.  However,  even  with  regular  grids,  the  principal  components 
transformation  speeds  up  the  solution  of  systems  of  equations  when  SVD  is  used.  In  fact, 
SVD  would  not  solve  some  systems  we  attempted  without  the  principal  components 
transformation. 

It  is  best  to  use  the  C  language  precompiler  called  "Lint"  on  all  code,  especially  that  which 
is  commercially  procured.  The  version  of  Lint  used  must  be  oriented  to  your  specific 
compiler.  During  this  project,  we  were  often  surprised  ovei  how  fragile  C  source  can  be  due 
to  compilable  and  executable  coding  errors  such  as  coercion,  promotion,  and  demotion. 
Lint  will  usually  catch  these  types  of  errors.  The  C  compilers  typically  will  not  catch  such 
errors. 
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12.  Conclusions  Relative  to  3D  Solid  Modelin 

The  authors  have  come  to  the  following  conclusions  relative  to  3D  solid  modeling. 

For  simple  surfaces,  traditional  interpolation  methods  yield  precision  superior  to  that 
achievable  by  stretch  and  hammer’s  current  implementation.  (Bicubic  interpolation  was 
used  as  z.  base-line.)  As  the  surface  becomes  more  complex,  the  stretch  and  hammer  neural 
network  yields  comparable  precision. 

For  complex  n-dimensional  problems  where  the  problem  space  cannot  be  sampled  using  an 
arbitrarily  dense  grid,  traditional  algorithms  grow  in  complexity  and  execution  time  as  n 
increases.  Traditional  methods  grow  in  execution  time  as  the  number  of  training  samples 
grows.  The  stretch  and  hammer  neural  network  remains  fixed  in  complexity  (but  not  size) 
as  n  and  the  number  of  training  samples  grows.  The  SHNN  remains  fixed  in  execution  time 
as  n  and  the  number  of  training  samples  grows  if  true  neural  hardware  is  used  for 
implementation. 

Traditional  interpolation  methods  cannot  extrapolate  outside  their  training  data  without 
making  certain  assumptions  and,  in  effect,  adding  training  data.  If  the  test  data  fall  outside 
the  training  data,  traditional  methods  cannot  interpolate  a  surface  height  without  the  added 
training  data.  Thus,  input  data  tolerances  can  cause  values  outside  the  realm  of 
computability  for  traditional  methods.  The  stretch  and  hammer  neural  network  easily 


generalizes  so  that  input  data  tolerances  are  accommodated. 


Finally,  stretch  and  hammer’s  parallel  nature  allows  it  to  take  advantage  of  off-shelf  and 
future  parallel  computing  hardware  such  as  transputers  and  neural  chips.  Thus,  there  is 
strong  reason  to  believe  that  the  SHNN  can  execute  in  real  time  despite  the  problem 
complexity.  Such  may  not  be  possible  with  traditional  interpolation  methods. 


p 
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13.  Two  Other  Applications 


Filtering  and  classification  are  two  other  areas  to  which  the  SHNN  may  have  useful 
application. 


Filtering 


An  application  of  the  stretch  and  hammer  neural  network  is  the  synthesis  of  filters  for  an 
optical  correlator  used  for  pattern  recognition.  Figure  13-1  shows  an  optical  correlator. 


Input  Scan*  Contnin Ins  T»ra« t  P»t  t«rn  (Square) 


Figure  13-1:  Optical  Correlator 
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Coherent  light  from  a  pixelated  input  scene  containing  a  target  pattern  passes  through  a 
simple  lens  one  focal  length  away  from  the  input.  One  focal  length  behind  the  lens,  the 
Fourier  transform  (or  spatial  frequency  spectrum)  of  the  input  scene  is  imaged.  A  pixelated 
transparency  filter  made  from  the  Fourier  transform  of  the  target  pattern  to  be  recognized 
is  also  placed  in  this  plane.  Light  passing  through  the  filter  is  now  the  product  of  the 
Fourier  transform  of  the  input  scene  and  the  filter  transparency.  This  light  is  then  passed 
through  another  lens  placed  one  focal  length  away.  One  focal  length  behind  this  second 
lens  an  image  showing  the  correlation  between  the  input  scene  and  the  target  pattern  is 
formed.  Points  of  high  correlation  appear  as  bright  spots  on  this  output  plane.  Points  on 
the  input  plane  and  the  output  plane  can  be  matched  in  a  one-to-one  manner.  Points  of 
high  correlation  in  the  output  plane  (bright  points)  can  be  related  to  specific  positions  on 
the  input  plane  that  indicate  the  presence  of  the  target  pattern. 

One  major  problem  with  this  method  is  that  the  filter  is  made  for  a  target  pattern  of  specific 
orientation  and  scale  (or  size).  If  the  target  pattern  in  the  input  scene  rotates  or  changes 
scale,  then  the  original  filter  does  not  produce  a  strong  correlation  point  in  the  output  plane 
for  the  newly  oriented  target.  It  is  possible  that  the  SHNN  can  be  used  to  synthesize  new, 
more  appropriate  filters  as  the  orientation  and  scale  of  the  target  pattern  changes  within  the 
input  scene.  The  input  to  the  SHNN  is  data  from  the  input  scene  which  contains 
information  about  the  orientation  and  scale  of  the  target  pattern.  The  output  from  the 
network  is  information  a.mut  how  to  synthesize  a  new  filter. 
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One  specific  implementation  now  being  investigated  at  the  University  of  Dayton  Research 
Institute  uses  the  following  steps.  First,  it  is  assumed  that  the  target  pattern  has  been 
previously  located  in  the  input  plane,  but  has  since  changed  its  orientation  and  scale.  The 
values  of  the  input  scene  in  the  region  around  the  target  location  are  sampled  using  a 
rectangular  grid  of  points.  The  values  from  the  sampling  grid  are  used  as  the  input  values 
for  the  neural  network.  The  network  is  trained  by  sampling  the  target  pattern  at 
orientations  and  scales  for  which  the  proper  filter  is  known.  The  outputs  of  the  network  are 
the  pixel  values  for  the  new  filter  which  matches  the  sampled  input  from  the  reoriented  and 
rescaled  target  pattern. 

Classification 

The  SHNN  may  be  employed  to  solve  classification  problems  since  classification  is  a  sub-set 
of  interpolation.  In  interpolation,  the  neural  network  determines  the  "height"  of  some 
hyper-surface  at  coordinates  not  included  in  the  training  set.  In  classification,  the  hyper¬ 
surface  consists  of  "heights"  that  correspond  to  given  classes.  For  instance,  all  coordinates 
(called  features  in  classification)  that  are  samples  of  the  automobile  class  might  have  a 
height  of  1.  All  features  that  are  of  none  of  the  classes  might  be  given  a  height  of  0. 
Classes  would  have  outputs  valued  sequentially  using  integers. 
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For  classification1  the  SHNN  would  be  trained  in  the  same  way  as  for  interpolation.  The 
training  set  would  include  all  the  feature  vectors  and  the  hyper-surface  "height"  that 
corresponds  to  the  appropriate  class  for  each  vector.  Then,  test  vectors  would  be  applied 
and  an  output  would  be  calculated.  The  difference  here  is  that  the  SHNN  final  output  node 
would  execute  the  output  function  int(output  +  0.5)  to  truncate  the  decimal  and  ensure 
an  integer  class  value. 

A  variation  on  this  idea  would  have  a  post-processor  layer  that  would  take  the  raw  SHNN 
output  and  determine  if  it  represented  a  known  class.  Here,  the  classes  would  not  have  to 
be  numbered  sequentially  (nor  even  be  integers).  Another  advantage  of  this  variation  is  that 
it  would  be  possible  to  have  an  output  value  between  two  class  values  but  not  close  enough 
to  either  to  say  it  belonged  to  one  class  or  the  other.  Thus,  the  opportunities  to  flag 
unknowns  would  increase. 

The  above  approach  to  the  use  of  the  SHNN  in  classification  makes  little  change  to  the 
original  method.  However,  there  are  some  weaknesses: 

a)  It  is  not  possible,  without  transforming  the  coordinate  axes,  to  collect  "evidence" 
that  multiple  classes  are  indicated  by  the  same  feature  vector. 

b)  A  class  with  a  low  output  value  could  be  dominated  by  a,  spatially,  nearby  class 
that  has  a  very  high  output  value. 

c)  Learning  in  real  time  would  be  difficult  if  not  unlikely. 
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A  solution  is  to  create  an  SHNN  with  multiple  outputs.  Then  connect  the  training  sample 
nodes  only  to  the  output  node(s)  for  the  class(es)  to  which  they  apply.  The  expected  output 
for  all  training  vectors  for  all  classes  would  then  be  fixed  at  some  value.  Class  selections 
would  be  based  on  the  highest-valued  output  node.  If  similarly  high  output  values  were 
produced  by  more  than  one  node,  then  an  ambiguity  would  result. 

The  least  squares  plane  fits  this  case  exactly.  There  is  no  difference  between  the  surface 
value  on  the  least  squares  plane  at  the  training  coordinates  and  the  expected  output  at  the 
training  coordinates.  Thus,  the  value  of  the  initial  Gaussian  peaks  would  be  zero.  The 
resulting  "hammered"  surface  would  simply  be  the  least  squares  plane  itself,  and  there  would 
be  no  surface  variability  that  could  be  used  to  separate  classes. 

A  solution  to  this  dilemma  is  to  "hammer"  to  the  zero  plane  instead  of  the  least  squares 
plane.  However,  the  difference  between  the  value  on  the  zero  plane  at  the  training 
coordinates  and  the  expected  output  at  the  training  coordinates  is  the  expected  output. 

Thus,  the  part  of  the  training  and  the  network  used  for  "hammering"  can  be  eliminated.  The 
Gaussian  coefficients  would  all  have  the  same  value.  The  Gaussian  functions  would 
generate  the  expected  output  at  the  training  coordinates  with  the  value  decreasing  as  the 
distance  from  the  training  coordinates  decreases.  The  expected  output  would  be  a  value  < 

that  is  the  same  for  all  classes  and  training  vectors.  The  Gaussian  coefficients  are  equal  to 
that  expected  value. 
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Columnar  diagonal  dominance  might  still  be  useful  for  determining  the  variances  for  each 
training  point’s  Gaussian.  It  is  possible  that  this  technique  could  be  used  to  ensure  that  the 
Gaussians  have  small  overlap.  During  some  brief  trials  with  this  idea,  we  found  that  the 

sum  of  the  off  diagonal  column  elements  needs  to  be  fairly  small.  We  had  success  with 

r 

sums  of  0.05  and  0.1.  All  the  training  examples  for  all  classes  were  placed  into  one  large 
matrix  for  the  diagonal  dominance  calculation. 

If  extra  training  sample  nodes  and  class  output  nodes  are  added  to  the  network,  real  time 
training  could  be  implemented  since  there  are  no  systems  of  equations  to  solve.  The 
variance  for  any  new  training  point  could  be  something  very  narrow,  the  average  of  the  two 
closest  original  training  points,  the  average  of  all  the  original  training  points,  or  something 
else  depending  on  the  situation. 

Specht’s  PNN  is  very  similar  to  the  Exponential  Neural  Network  (ENN)  outlined  above. 
Specht’s  radial  basis  functions  are  Gaussian,  but  he  varies  the  coefficient  to  ensure  a  volume 
of  unity  under  each  Gaussian.  The  PNN  uses  the  same  variance  at  each  training  point.  The 
ENN  subset  of  the  SHNN  maintains  a  Gaussian  coefficient  equal  to  the  expected  output  and 
places  Gaussian’s  with  generally  different  variances  at  each  training  point. 

t 
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14.  Recommendations 


Much  work  has  been  done,  by  various  researchers,  on  basis  function  methods  for 
programming  massive  arrays  of  parallel  processors.  This  work  should  be  consolidated  since 
it  offers  much  in  the  way  of  logistical  supportability.  Many  otner  currently  popular  methods 
of  programming  these  arrays  are  not  logisticallv  supportable. 

A  serious  problem  with  the  SHNN  is  that  it  does  not  train  in  real  time  for  interpolation. 
Traditional  interpolation  methods  train  in  real  time  and  are  more  or  as  accurate  as  the 
SHNN,  The  SHNN  has  the  advantage  of  constant  execution  time,  given  true  neural 
hardware,  in  spite  of  increases  in  the  dimensions  and  the  numbers  of  training  vectors. 
Traditional  interpolation  methods  do  not  share  this  execution  advantage.  Efforts  should  be 
undertaken  to  develop  a  means  of  training  the  SHNN  in  real  time  for  interpolation. 

It  would  be  interesting  if  neural  network  interpretations  could  be  devised  for  traditional 
interpolation  schemes.  Such  interpretations  are  not  obvious  but  further  study  on  the  basic 
theories  behind  the  traditional  methods  might  reveal  them.  It  is  true  that  a  pipeline 
interpretation  is  fairly  easily  made  of  traditional  interpolation  methods.  However,  the  length 
of  the  pipe  tends  to  grow  as  does  the  number  of  dimensions.  Each  joint  on  the  pipe  takes 
longer  to  execute  as  the  number  of  training  vectors  grows.  The  advantage  of  the  pipeline 
is  that  the  joints  execute  in  parallel.  Upon  receiving  its  input  data,  a  joint  can  begin 
execution  while  earlier  joints  begin  preparing  the  next  round  of  input.  With  neural 
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networks,  the  layers  and  the  nodes  in  each  layer  execute  in  parallel.  Like  the  pipeline  joints, 
a  layer  receives  its  input  data  and  begins  executing.  Previous  layers  then  begin  preparing 
the  next  round  of  input.  Unlike  the  pipeline,  the  number  of  layers  is  fixed.  As  dimensions, 
training  vectors,  and  classes  are  added,  the  number  of  nodes  in  various  layers  grows.  The 

f 

nodes  in  each  layer  always  execute  in  parallel  so  there  is  never  any  change  in  the  execution 
time  of  each  layer  if  true  neurai  hardware  is  used.  Also  unlike  the  pipeline,  the  neural 
network  remains  fixed  in  complexity.  As  the  pipe  grows  in  length  with  more  joints  added, 
a  more  sophisticated  control  method  is  needed.  Neural  networks  simply  change  weight 
values  from  zero  on  inter-node  connections  to  activate  additional  nodes. 

Only  the  least  squares  plane  and  Gaussian  radial  basis  functions  were  applied  to  the  SHNN 
in  this  study.  Explorations  were  briefly  undertaken  during  this  study  to  determine  if  other 
sur  faces  and  functions  would  yield  greater  interpolation  accuracy.  These  explorations  should 
continue. 

An  effort  to  pin  down  the  effect  of  off-diagonal  sums  of  less  than  1  should  be  made.  An 
experimental  approach  would  perform  tests  on  surfaces  of  increasing  complexity  while 
applying  to  each  surface  an  increasing  density  of  sampling.  For  each  surface,  at  each 
i  incremental  density  chosen,  various  off-diagonal  sums,  in  a  logical  progression,  could  be 

tried. 
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On  very  complex  surfaces,  the  SHNN  has  accuracy  comparable  to  traditional  interpolation 
schemes.  However,  we  did  not  have  the  computing  power  to  use  the  sampling  density 
necessary  to  achieve  CAD/CAM  accuracy  on  the  most  complex  surface.  When  it  becomes 
possible,  the  sampling  density  should  be  significantly  increased  until  CAD/CAM  accuracy 
is  reached.  The  goal  is  to  see  if  the  SHNN  needs  fewer  samples  than  traditional  methods 
need  to  reach  CAD/CAM  interpolation  accuracy  on  very  complex  surfaces.  To  this  end, 
the  authors  have  submitted  a  grant  proposal  for  time  on  the  Ohio  Super  Computer  Center’s 
Cray.  This  proposal  was  approved  after  the  research  period  that  produced  this  report. 
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